This report explores a datasert containing different chemical properties as well as the quality of the wines.
This report tries to understand if the chemical properties of the red wines play a role in its quality.
## X fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 1 7.4 0.70 0.00 1.9 0.076
## 2 2 7.8 0.88 0.00 2.6 0.098
## 3 3 7.8 0.76 0.04 2.3 0.092
## 4 4 11.2 0.28 0.56 1.9 0.075
## 5 5 7.4 0.70 0.00 1.9 0.076
## 6 6 7.4 0.66 0.00 1.8 0.075
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 11 34 0.9978 3.51 0.56 9.4
## 2 25 67 0.9968 3.20 0.68 9.8
## 3 15 54 0.9970 3.26 0.65 9.8
## 4 17 60 0.9980 3.16 0.58 9.8
## 5 11 34 0.9978 3.51 0.56 9.4
## 6 13 40 0.9978 3.51 0.56 9.4
## quality
## 1 5
## 2 5
## 3 5
## 4 6
## 5 5
## 6 5
names(rwine)
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
## [1] 1599 13
str(rwine)
## 'data.frame': 1599 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
#Another quick summarized look at the dataset
summary(rwine)
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
There are 1,599 observations with 13 columns. The first column, X, does not contain any variable but simple the numbering. Therefore, there are only 12 variables in the datset.
Quality will be the main focus of this analysis.
In order to proceed, first by removing variable X.
## fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 7.4 0.70 0.00 1.9 0.076
## 2 7.8 0.88 0.00 2.6 0.098
## 3 7.8 0.76 0.04 2.3 0.092
## 4 11.2 0.28 0.56 1.9 0.075
## 5 7.4 0.70 0.00 1.9 0.076
## 6 7.4 0.66 0.00 1.8 0.075
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 11 34 0.9978 3.51 0.56 9.4
## 2 25 67 0.9968 3.20 0.68 9.8
## 3 15 54 0.9970 3.26 0.65 9.8
## 4 17 60 0.9980 3.16 0.58 9.8
## 5 11 34 0.9978 3.51 0.56 9.4
## 6 13 40 0.9978 3.51 0.56 9.4
## quality
## 1 5
## 2 5
## 3 5
## 4 6
## 5 5
## 6 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
To better categorize the quality if the wine, let’s separate them into 3 categories: undesirable, good and amazing.
In order to do so, let’s first transform quality from integers to an ordered factor.
# Transform quality from an integer to an ordered factor
rwine$quality <- factor(rwine$quality, ordered = T)
#Create the rating variable
rwine$rating <- ifelse(rwine$quality < 5, 'undesirable', ifelse(
rwine$quality < 7, 'good', 'amazing'))
#Ordering the new variable rating
rwine$rating <- ordered(rwine$rating,
levels = c('undesirable', 'good', 'amazing'))
As seen from the graph, and confirmed from the data analysis, most of the wine quality are between 5 and 6, and none of them are either 1,2 or 9.
When plotting for rating, it is clear that once again, most wines fall within the good category, theer are a few undesirable wines with a bit more of amazing wines.
When talking about wine, one of the first thing to analyze at is the alcohol level!
#Get summary for variable alcohol
summary(rwine$alcohol)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
The alcohol content has a right skewed distribution, where most of them tends to be lower. The average is 10.42. There are a few outliers, with the highest being 14.90.
Next, let’s explore the other variables going down their correlation level.
summary(rwine$fixed.acidity)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
This is a normally distributed variable, most wines have their fixed acidity level beween 7 and 9, with a few outliers over 13, all the way up to 15.9.
summary(rwine$sulphates)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
The sulphate level is right skewed with mean of 0.65 and some outliers ranging up to 2.
summary(rwine$citric.acid)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
Most of the citric acid level are under 0.75, to be more specific, 75% are under 0.42. There is an outlier with a value of 1.
summary(rwine$total.sulfur.dioxide)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
Once again, the total sulfur level is right skewed, with most of them falling between 22 and 62.
summary(rwine$density)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0037
The density is normally distributed, with a mean of 0.9968.
summary(rwine$chlorides)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
The chlorides level is right skewed. Discounting the outliers, we can see that most of the values are actually normally distributed with a mean of 0.079. Using the log10 of the distribution to dampen the effect of the ouotliers, we have the following graph.
This graph confirms the finding that by removing the outliers, the chlorides level are normally distributed around its mean.
summary(rwine$fixed.acidity)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
The acidity is normally distributed with a mean of 7.90. As seen on the box plot, there is still a few outliers.
summary(rwine$pH)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
Same with the fixed acidity level, the pH level is normally distributed. Having a mean of 3.10, it can be seen that red wines are acidic.
summary(rwine$residual.sugar)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
Same with chloride level, on first look, the residual sugar seems to be a right skewed distribution. But with the help of the box plot, we can see that there are many outliers. By removing them, we will have another perspective.
From this new plot, it can be seen that the residual sugar level is indeed normally distributed around its mean.
summary(rwine$free.sulfur.dioxide)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
The free sulfur dioxide level is right skewed. Given that the distance between the first quartile and median is shorter than the distance between the median the the third quartile, it can be concluded that the free sulfur dioxide level tends to concentrate more in Q2.
There are 1,599 observations of the red wine data. The quality of these wines have been rated by 3 wine experts from 0 (very bad) to 10 (excellent). There is a total of 12 variables, 11 input variables and 1 output variable.
The input variables are: - alcohol; - chlorides; - citric acid; - fixed acidity; - free sulfur dioxide; - pH; - residual sugar; - sulphates; - total sulfur dioxide; - volatile acidity.
Output variable: - quality
The main feature of interest in the quality of the wine. More specifically, is there a common factor for in the wines that are perceived as better quality by the wine expert, and how does the different chemical properties impact the perceived quality.
There are many features that are interesting to look at. As seen from a brief correlation analysis above, the first one is the alcohol content, volatile acidity, sulphates and citric acid all play a bigger role in the perceived quality of a wine.
I created a new variable called rating, ranging from “undesirable” (lower than 5), “good” (between 5 and 7), and “amazing” (higher than 7) in order to counter the lack of spread in the quality rating.
This is a very clean and tidy dataset. Only the “X” column has been deleted since it was not useful. The quality variable has been changed to an ordered factor for visualization, but to perform further analysis, it has been changed back to its numerical value.
Some of the unusual distributions are that chlorides, residual sugar and free sulfur dioxide levels are all skewed to the right. The citric acid was also interesting as it has spikes at different levels, possibly suggesting that it was made on purpose, perhaps due to a lot of wine producers using the same kind of grapes.
Taking a look at which variable is correlated with the quality of the wines.
## [,1]
## fixed.acidity 0.1241
## volatile.acidity -0.3906
## citric.acid 0.2264
## residual.sugar 0.0137
## chlorides -0.1289
## free.sulfur.dioxide -0.0507
## total.sulfur.dioxide -0.1851
## density -0.1749
## pH -0.0577
## sulphates 0.2514
## alcohol 0.4762
## quality 1.0000
As the correlation analysis demonstrates, alcohol content is the highest correlated to quality, followed by volatile acidity and sulphates. It is interesting to see how pH seems to not have any impact on the quality of the wine, taking a closer look at a few of those variables.
It can be seen that alcohol content has the strongest correlation with quality. Therefore, taking a closer look at the alcohol content.
As seen from the graphs above, there is a clear correlation that whenever the alcohol content is higher, the quality also tends to be higher. It is even more evident when plotting against the rating, as this elimaates some of the lack of spreads.
From the 2 box plots avove, it can be seen that the volatile acidity and the quality are inversely correlated. Meaning that a higher volatile acidity tends to lead to a lower perceived quality. Once again, this finding can be seen even more effectively when plotting against the rating.
On the quality plot, it can be seen that the citric acid level has a positive correlation with the quality, where higher level of citric acid tends to lead to a higher perceived quality level. This can be further reinforced when plotting this varriable against the rating variable.
The sulphates level is positively correlated with the quality of the wine. Interestingly enough, the median as well as mean for quality 7 to 8 are quite similar. This can be intepreted as a plateau that the sulphates level should not exceed a certain point.
In order to reinforce this assumption, doing another analysis.
This above plot reinforces the previous assumption. Once the sulphates level is too high, it does not bring any additional positive value to the quality. But rather, it lower the perceived quality of a wine.
According to the correlation analysis, the correlation between quality and pH tends to be low. From the quality graph, it can be seen that thre is a only a slight correlation between pH and quality. But on the rating graph, this correlation is more pronounced. It can be noted that the better the quality of a wine, the lower its pH level tends to be.
There is no clear correlation between the residual sugar level and the quality of a wine.
A correlation analysis has been run against the quality of the wines previously. Now let’s take a look at the correlations between the other variables.
## fixed.acidity volatile.acidity citric.acid
## fixed.acidity 1.0000 -0.2561 0.6717
## volatile.acidity -0.2561 1.0000 -0.5525
## citric.acid 0.6717 -0.5525 1.0000
## residual.sugar 0.1148 0.0019 0.1436
## chlorides 0.0937 0.0613 0.2038
## free.sulfur.dioxide -0.1538 -0.0105 -0.0610
## total.sulfur.dioxide -0.1132 0.0765 0.0355
## density 0.6680 0.0220 0.3649
## pH -0.6830 0.2349 -0.5419
## sulphates 0.1830 -0.2610 0.3128
## alcohol -0.0617 -0.2023 0.1099
## quality 0.1241 -0.3906 0.2264
## residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity 0.1148 0.0937 -0.1538
## volatile.acidity 0.0019 0.0613 -0.0105
## citric.acid 0.1436 0.2038 -0.0610
## residual.sugar 1.0000 0.0556 0.1870
## chlorides 0.0556 1.0000 0.0056
## free.sulfur.dioxide 0.1870 0.0056 1.0000
## total.sulfur.dioxide 0.2030 0.0474 0.6677
## density 0.3553 0.2006 -0.0219
## pH -0.0857 -0.2650 0.0704
## sulphates 0.0055 0.3713 0.0517
## alcohol 0.0421 -0.2211 -0.0694
## quality 0.0137 -0.1289 -0.0507
## total.sulfur.dioxide density pH sulphates
## fixed.acidity -0.1132 0.6680 -0.6830 0.1830
## volatile.acidity 0.0765 0.0220 0.2349 -0.2610
## citric.acid 0.0355 0.3649 -0.5419 0.3128
## residual.sugar 0.2030 0.3553 -0.0857 0.0055
## chlorides 0.0474 0.2006 -0.2650 0.3713
## free.sulfur.dioxide 0.6677 -0.0219 0.0704 0.0517
## total.sulfur.dioxide 1.0000 0.0713 -0.0665 0.0429
## density 0.0713 1.0000 -0.3417 0.1485
## pH -0.0665 -0.3417 1.0000 -0.1966
## sulphates 0.0429 0.1485 -0.1966 1.0000
## alcohol -0.2057 -0.4962 0.2056 0.0936
## quality -0.1851 -0.1749 -0.0577 0.2514
## alcohol quality
## fixed.acidity -0.0617 0.1241
## volatile.acidity -0.2023 -0.3906
## citric.acid 0.1099 0.2264
## residual.sugar 0.0421 0.0137
## chlorides -0.2211 -0.1289
## free.sulfur.dioxide -0.0694 -0.0507
## total.sulfur.dioxide -0.2057 -0.1851
## density -0.4962 -0.1749
## pH 0.2056 -0.0577
## sulphates 0.0936 0.2514
## alcohol 1.0000 0.4762
## quality 0.4762 1.0000
Before doing further analysis, taking a look at the above information in a graphical form.
Let’s first start by taking a loot between alcohol, the most relevant indicator of quality, against its highest correlated variables, density and pH.
This graph shows that there is a negative correlation between the alcohol level and the density level. Which is to be expected, as alcohol is less dense than water.
As for the pH level vis-a-vis the alcohol level, there is a slight positive correlation between them. As the above graph shows, the pH will increase with the alcohol level to a certain point, and the decrease is caused by a single outlier which contained high level of alcohol.
As expected, the higher the level of citric acidity or fixed acidity, the lower the pH. Which is a numbering representation of the acidity or basisity level, where a lower number indicates it being more acid. Both the fixed acidity and citric acidity follow are negatively correlated to pH, therefore we can hypothesis that they are positively correlated. Taking a closer look.
It can be clearly seen that there is a positive correlation between fixed acidity and citric acidity. Once again, it is interesting to note how many wines have a 0 or close to 0 citric acidity.
From this graph, we can see that there is a strong negative correlation between
volatile acidity and the citric acid level in the wine. It is interesting to note that there is somekind of plateau for citric acidity, arunod 0.5.
There is a slight negative correlation between volatile acidity and fixed acidity.
From the two graphs above, we can see that both the citric acid and fixed acidity are positively correlated to density. Which is expected given that a high citric acid will lead to a high fixed acidity. And that a high level of acidity will be more dense than one of lower level.
From this graph, we can see that there is a high concentration of both chlorides and sulphates at lower level, and there seems to be a positive correlation between them.
Once zooned ib, the positive correlation can be seen a bit more clearly. Although there is somewhat of a dip at the 0.125 chlorides level, the general trend is still that of a positive correlation.
The main feature of interest is to see if any variable affected the quality of the wine. Some of the positive correlation are citric acid, sulphate and of course alcohol level. While some of the negative correlation are volatile acidity and its insuing pH level.
More observations: - Alcohol has the highest correlation with the quality of a wine; - The residual sugar has no correlation with the quality of a wine; - Alcohol is not highly correlated to any metrics other than density, which it is negatively correlated; - Volatile acidity is inversely correlated to the quality of a wine, while citric acid is positively correlated to it. - The pH level, which is affected by the volatilve acidity, fixed acidity and citric acid, is negatively correlated to quality, meaning a more sour wine will tend to be ranked higher; - Sulphates is positively correlated to quality, although a too high concentration of sulphates will then be considered undesirable.
As expected, fixed acidity and citric acid as well as pH are all related. While the volatile acidity and fixed acidity, as well as volatile acidity and the citric acid are all inversely correlated. The density also goes up with a higher level of citric acid and fixed acidity.
Another observation is between alcohol and density, which makes sense given that alcohol molecules are less dense than water molecules.
Another intersting observation is that the sulphates level and chlorides level are positively related, with the sulphates playing a direct role in the perceived quality.
The strongest relationship is the one of pH and fixed acidity. While the alcohol has the strongest relationship with the quality of the wine.
Tip: Now it’s time to put everything together. Based on what you found in the bivariate plots section, create a few multivariate plots to investigate more complex interactions between variables. Make sure that the plots that you create here are justified by the plots you explored in the previous section. If you plan on creating any mathematical models, this is the section where you will do that.
Starting by lotting the two highest correlated with quality, alcohol and volatile acidity, it can be seen that there is a clear positive correlation between those 2 variables and the perceived quality of the wine. One interesting side note is that too high of volatile acidity will actually lead to a decrease in perveied quality of the wine.
By further breaking them down by rating, it can be more clearly seen that the higher volatile acidity will actually lead to a lower quality in wine. But despite this, a higher degree of both alcohol and volatile acidity will lead to a higher perceived quality of the wine.
In the above graph, it can be seen that both the alcohol and citric acid will positively impact the quality of a wine. However, alcohol will have a bigger impact on quality. Citric acid plays an important role, as most of the undesirable quality wine are found with a low citric acid content.
This graph shows us that a higher volatile acidity with low citric acid will positively impact the quality of a wine. Wine that have high citric acidity and a medium to low volatile acidity will score lower, which is very interesting to see.
There is a positive relationship between the fixed acidity, citric acid and quality that is clearly demonstrated by this graph.
By further dividing them in their rating category, this relationship is demonstrated even further. Especially in the amazing rating, where both a high citric acid and fixed acidity will lead to a much higher quality.
This graph shows us that the denser the wine, the quality is lower. At the same time, it can be seen that the density and fixed acidity are once again clearly correlated.
From this graph, it can be clearly seen that the amazing category, which contains higher rating, tend to have a lesser density than the other category.
Zooming in on this graph and separating them by rating gives us this:
With this, it can be concluded that most of the wines rated the same has first of all a very similar chlorides and sulphates level. Also that both variables are positively correlated with quality.
Last but not the least, taking a look at a graph of the highest correlated variable, alcohol, with one that is the lowest, pH.
As seen previously, pH does not have an impact on the quality of the wine. Even if it is directly related to fixed acidity, which is related to volatile acidity. This is can be told by the spread of different quality throughout the different pHs. The only variable in play here is the alcohol content.
In order to better analyze and understand the relationships between the variables and what combination contributes to a higher perceived quality, the bivariates analysis have been fitted with either their quality and/or rating.
N/A
Alcohol, being the variable that is the highest correlated to quality, plays an important role in quality. A higher alcohol content will lead to a higher quality in wine.
This graph is interesting as it demonstrates that no matter the quality of the wine, the wine have a grouping of sulphates and chlorides. There are also quite a few outliers, meaning that some wine can have alot of chlorides or sulphates, or both. But in that case, the wine tend to me of comparatively lesser quality than those who are medium to low in each case.
Firstly, it is undenaible that alcohol plays a huge role in the quality of the wine. As a good wine will almost always have a high alcohol level. At the same time, although the trendlines shows otherwise, it can be seen from the points that the citric acid level also plays a big role. As most of the undersirable wines tends to a very low level of citric acid. From the above graph, it can be seen that the expert tend to seek a range in the citric acid, a too low or too high concentration are not recommeneded. ——
This is a dataset of 1,599 obervations of wine with 12 variables, which are 11 input variables describing the different chemical properties of the wine and 1 output varible which is the quality of the wine. The quality is provided by wine experts rating them from 0 (worst) to 10 (best).
The first step was to load the csv file and take a quick look to see its format and have a feeling of the data and see where to start.
The first step was to do an univariate analysis to better understand each of the variables. It can be seen that most of the quality are between 5 and 6. For the lack of data for the other quality, the rating variable has been created. Most of the variable follow a normal distribution, which indicates that wine of similar quality will have a similar chemical components.
In further analysis, it has been shown that alcohol, volatile acidity, sulphates and citric acid all play a bigger role in the quality of the wine, with alcohol having the highest correlation. In brief, it can be concluded that the likelyhood that a wine is considered good will have a higher alcohol content, lower volatile acidity, higher citric acid and higher sulphates. In this also then that it was discovered that despite pH being highly correlated to volatile acidity, does not play a role in determining the quality of a wine.
The relationship between each of the variables has also be studied, where, as expected, volatile acidity and fixed acidity are all correlated with the pH level. And another obvious negative correlation is between density and alcohol. More interestingly so, chlorides and sulphates are also correlated, but with the majority of them being clustered together, with a few outlier. This relationship leads to believe that most wine will seek the same level in those 2 chemical components, with the outliers being a hit or miss.
The relationship between sulphates and chlorides have been studied further in the multivariate analysis, where it can be seen that although they do have a direct impact on the quality, the outliers, in other words the wine with a high amount of either, are often considered less desirable. Following the density analysis, it can be seen that better wine are less dense, which is to be expected given that alcohol have a smaller density than water. The relationship between the volatile acidity and citric acid with quality definetively proved that a good wine will have less volatile acidity and higher citric acid. Once again, the pH with the biggest correlator, alcohol, revealed that no matter the pH, a higher alcohol will lead to a higher quality of wine.
Taking a look back, there were mny limitations to this analysis. First of all, most of the wine’s quality are 5 or 6. It would be beneficial to have a more widespread quality ratings, to determine the relationships with higher confidance. It would also be interesting to add where the wine is coming from, the kind of grape being used, and also their price tag. As some people say, a great wine does not need to be expensive, it would be interesting to verify that claim.